A Formal Approach to Memory Access Optimization: Data Layout, Reorganization, and Near-Data Processing

نویسنده

  • BERKIN AKIN
چکیده

The memory system is a major bottleneck in achieving high performance and energy efficiency for various processing platforms. This thesis aims to improve memory performance and energy efficiency of data intensive applications through a two-pronged approach which combines a formal representation framework and a hardware substrate that can efficiently reorganize data in memory. The proposed formal framework enables representing and systematically manipulating data layout formats, address mapping schemes, and memory access patterns through permutations to exploit the locality and parallelism in memory. Driven by the implications from the formal framework, this thesis presents the HAMLeT architecture for highly-concurrent, energy-efficient and low-overhead data reorganization performed completely in memory. Although data reorganization simply relocates data in memory, it is costly on conventional systems mainly due to inefficient access patterns, limited data reuse, and roundtrip data traversal throughout the memory hierarchy. HAMLeT pursues a near-data processing approach exploiting the 3D-stacked DRAM technology. Integrated in the logic layer, interfaced directly to the local controllers, it takes advantage of the internal fine-grain parallelism, high bandwidth and locality which are inaccessible otherwise. Its parallel streaming architecture can extract high throughput from stringent power, area, and thermal budgets. The thesis evaluates the efficient data reorganization capability provided by HAMLeT through several fundamental use cases. First, it demonstrates software-transparent data reorganization performed in memory to improve the memory access. A proposed hardware monitoring determines iii inefficient memory usage and issues a data reorganization to adapt an optimized data layout and address mapping for the observed memory access patterns. This mechanism is performed transparently and does not require any changes to the user software—HAMLeT handles the remapping and its side effects completely in hardware. Second, HAMLeT provides an efficient substrate to explicitly reorganize data in memory. This gives an ability to offload and accelerate common data reorganization routines observed in high-performance computing libraries (e.g., matrix transpose, scatter/gather, permutation, pack/unpack, etc.). Third, explicitly performed data reorganization enables considering the data layout and address mapping as a part of the algorithm design space. Exposing these memory characteristics to the algorithm design space creates opportunities for algorithm/architecture co-design. Co-optimized computation flow, memory accesses, and data layout lead to new algorithms that are conventionally avoided.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

In-memory Data Reorganization Performed in Parallel with Host Processor Memory Accesses and Provides Mechanisms to Handle Issues Such

......Data reorganization operations often appear as a critical building block in scientific computing applications, such as signal processing, molecular dynamics simulations, and linear algebra computations (for example, matrix transpose, pack and unpack, and shuffle). High-performance libraries, such as the Intel Math Kernel Library (https://software. intel.com/en-us/intel-mkl), generally pro...

متن کامل

Effect of Data Layout in the Evaluation Time of Non-Separable Functions on GPU

GPUs are able to provide a tremendous computational power, but their optimal usage requires the optimization of memory access. The many threads available can mitigate the long memory access latencies, but this usually demands a reorganization of the data and algorithm to reach the performance peak. The addressed problem is to know which data layout produces a faster evaluation when dealing with...

متن کامل

Dynamic Data Layouts for Cache-Conscious Factorization of DFT

Effective utilization of cache memories is a key factor in achieving high performance in computing the Discrete Fourier Transform (DFT). Most optimization techniques for computing the DFT rely on either modifying the computation and data access order or exploiting low level platform specific details, while keeping the data layout in memory static. In this paper, we propose a high level optimiza...

متن کامل

Gong, Zhenhuan. Multi-level Data Layout Optimization for Heterogeneous Access Patterns. (under the Direction of Dr. Nagiza F. Samatova.) Multi-level Data Layout Optimization for Heterogeneous Access Patterns

GONG, ZHENHUAN. Multi-level Data Layout Optimization for Heterogeneous Access Patterns. (Under the direction of Dr. Nagiza F. Samatova.) Recent years have seen an enormous increase in computation power of leadership computing facilities. As a result, huge amounts of data, from terascale to petascale, are being produced by scientific applications running on supercomputers. However, the I/O subsy...

متن کامل

Parallelization of Rich Models for Steganalysis of Digital Images using a CUDA-based Approach

There are several different methods to make an efficient strategy for steganalysis of digital images. A very powerful method in this area is rich model consisting of a large number of diverse sub-models in both spatial and transform domain that should be utilized. However, the extraction of a various types of features from an image is so time consuming in some steps, especially for training pha...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015